<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: Bivariate Data Set with 3 Clusters</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<link rel="stylesheet" type="text/css" href="R.css" />
</head><body>

<table width="100%" summary="page for xclara"><tr><td>xclara</td><td style="text-align: right;">R Documentation</td></tr></table>

<h2>Bivariate Data Set with 3 Clusters</h2>

<h3>Description</h3>

<p>An artificial data set consisting of 3000 points in 3 quite well-separated
clusters.
</p>


<h3>Usage</h3>

<pre>data(xclara)</pre>


<h3>Format</h3>

<p>A data frame with 3000 observations on 2 numeric variables (named
<code>V1</code> and <code>V2</code>) giving the
<i>x</i> and <i>y</i> coordinates of the points, respectively.
</p>


<h3>Note</h3>

<p>Our version of the <code>xclara</code> is slightly more rounded than the one
from <code>read.table("xclara.dat")</code> and the relative
difference measured by <code>all.equal</code> is <code>1.15e-7</code> for
<code>V1</code> and <code>1.17e-7</code> for <code>V2</code> which suggests that our
version has been the result of a <code>options(digits = 7)</code>
formatting.
</p>
<p>Previously (before May 2017), it was claimed the three cluster were
each of size 1000, which is clearly wrong.  <code>pam(*, 3)</code>
gives cluster sizes of 899, 1149, and 952, which apart from seven
&ldquo;outliers&rdquo; (or &ldquo;mislabellings&rdquo;) correspond to
observation indices <i>1:900</i>, <i>901:2050</i>, and
<i>2051:3000</i>, see the example.
</p>


<h3>Source</h3>

<p>Sample data set accompanying the reference below (file
&lsquo;<span class="file">xclara.dat</span>&rsquo; in side &lsquo;<span class="file">clus_examples.tar.gz</span>&rsquo;).
</p>


<h3>References</h3>

<p>Anja Struyf, Mia Hubert &amp; Peter J. Rousseeuw (1996)
Clustering in an Object-Oriented Environment.
<em>Journal of Statistical Software</em> <b>1</b>.
doi: <a href="https://doi.org/10.18637/jss.v001.i04">10.18637/jss.v001.i04</a>
</p>


<h3>Examples</h3>

<pre>
## Visualization: Assuming groups are defined as {1:1000}, {1001:2000}, {2001:3000}
plot(xclara, cex = 3/4, col = rep(1:3, each=1000))
p.ID &lt;- c(78, 1411, 2535) ## PAM's medoid indices  == pam(xclara, 3)$id.med
text(xclara[p.ID,], labels = 1:3, cex=2, col=1:3)

 px &lt;- pam(xclara, 3) ## takes ~2 seconds
 cxcl &lt;- px$clustering ; iCl &lt;- split(seq_along(cxcl), cxcl)
 boxplot(iCl, range = 0.7, horizontal=TRUE,
         main = "Indices of the 3 clusters of  pam(xclara, 3)")

 ## Look more closely now:
 bxCl &lt;- boxplot(iCl, range = 0.7, plot=FALSE)
 ## We see 3 + 2 + 2 = 7  clear "outlier"s  or "wrong group" observations:
 with(bxCl, rbind(out, group))
 ## out   1038 1451 1610   30  327  562  770
 ## group    1    1    1    2    2    3    3
 ## Apart from these, what are the robust ranges of indices? -- Robust range:
 t(iR &lt;- bxCl$stats[c(1,5),])
 ##    1  900
 ##  901 2050
 ## 2051 3000
 gc &lt;- adjustcolor("gray20",1/2)
 abline(v = iR, col = gc, lty=3)
 axis(3, at = c(0, iR[2,]), padj = 1.2, col=gc, col.axis=gc)

</pre>


</body></html>
